Matlab software for the profile regression analysis of the Gen-Air data 
in the Papathomas et al. (2010) manuscript 'Examining the joint effect of multiple risk factors: 
lung cancer in non-smokers'
This software allows for adjusting for confounders using a logistic regression model
Confounders should be either continuous or binary variables, 
categorical confounders with c categories can be accommodated by treating them as c-1 binary confounders 
No missing observations are allowed for the confounders
----------------------------------------------------------------
Michail Papathomas, June 2010

Requirements:
-------------
Implementing this software requires the "Statistics" Matlab toolbox 

Functions:
----------
This zip file contains, among others, the following MATLAB files:

batchforGenair.m 

	This is the main batch file where
	 (1) The data file is uploaded and required information on the data is provided.
	 (2) Parameters related to the MCMC sampling are set (iterations, thinning, burn-in). 
	This file also calls the main functions required for 
	 (3) Running the MCMC sampling algorithm, 
	 (4) Obtaining the 'best' average partition Zbest using the produced similarity matrix and PAM.
	     There is also the option of obtaining Zbest by choosing the clustering that minimizes
	     the distance from the similarity matrix ('least squares' method),
	     and the option of applying hierarchical clustering to the similarity matrix.
	 (5) Post-processing the MCMC output and making inferences for the groups in Zbest
	     through model averaging.
	For more details see instructive comments in the file


mainnointwithordinalarand.m

	This function performs the MCMC Gibbs sampling
	Note: There is provision in this function for sampling 0-1 switches in order  
	      to perform variable selection. However, variable selection was 
	      not performed in the relevant manuscript and thus the sampling of
	      switches has been removed by modifying the function appropriately. 
	Note: The 'sigmasqforgamma' parameter controls the acceptance rate in the 
	      sampling of the threshold values for ordinal covariates 


pam.m

	Implements the Partitioning Around Medoids approach.
	Uses the generated similarity matrix to produce the 'best' average partition Zbest.


reproduceparam6logistic.m
	
	Post-processes the MCMC output and, through model averaging, provides inferences for 
	the sub-populations in Zbest. 
	It provides point estimates (mean or median) and 95% credible intervals for:
		The risk effect parameters \theta_k in each group k
		The profile probabilities #Phi_k in each group k
		The difference between each #Phi_k and the average corresponding probability 
			in the whole sample
		
		
Data set:
---------

A mock randomly generated GenAir data set is provided (pretend_genair_data.mat). 
It can be used as an example for the implementation of the provided profile regression software. 
Missing observations should be denoted with -999		 

